11 research outputs found
Learning Implicit Probability Distribution Functions for Symmetric Orientation Estimation from RGB Images Without Pose Labels
Object pose estimation is a necessary prerequisite for autonomous robotic
manipulation, but the presence of symmetry increases the complexity of the pose
estimation task. Existing methods for object pose estimation output a single 6D
pose. Thus, they lack the ability to reason about symmetries. Lately, modeling
object orientation as a non-parametric probability distribution on the SO(3)
manifold by neural networks has shown impressive results. However, acquiring
large-scale datasets to train pose estimation models remains a bottleneck. To
address this limitation, we introduce an automatic pose labeling scheme. Given
RGB-D images without object pose annotations and 3D object models, we design a
two-stage pipeline consisting of point cloud registration and
render-and-compare validation to generate multiple symmetrical
pseudo-ground-truth pose labels for each image. Using the generated pose
labels, we train an ImplicitPDF model to estimate the likelihood of an
orientation hypothesis given an RGB image. An efficient hierarchical sampling
of the SO(3) manifold enables tractable generation of the complete set of
symmetries at multiple resolutions. During inference, the most likely
orientation of the target object is estimated using gradient ascent. We
evaluate the proposed automatic pose labeling scheme and the ImplicitPDF model
on a photorealistic dataset and the T-Less dataset, demonstrating the
advantages of the proposed method
YOLOPose V2: Understanding and Improving Transformer-based 6D Pose Estimation
6D object pose estimation is a crucial prerequisite for autonomous robot
manipulation applications. The state-of-the-art models for pose estimation are
convolutional neural network (CNN)-based. Lately, Transformers, an architecture
originally proposed for natural language processing, is achieving
state-of-the-art results in many computer vision tasks as well. Equipped with
the multi-head self-attention mechanism, Transformers enable simple
single-stage end-to-end architectures for learning object detection and 6D
object pose estimation jointly. In this work, we propose YOLOPose (short form
for You Only Look Once Pose estimation), a Transformer-based multi-object 6D
pose estimation method based on keypoint regression and an improved variant of
the YOLOPose model. In contrast to the standard heatmaps for predicting
keypoints in an image, we directly regress the keypoints. Additionally, we
employ a learnable orientation estimation module to predict the orientation
from the keypoints. Along with a separate translation estimation module, our
model is end-to-end differentiable. Our method is suitable for real-time
applications and achieves results comparable to state-of-the-art methods. We
analyze the role of object queries in our architecture and reveal that the
object queries specialize in detecting objects in specific image regions.
Furthermore, we quantify the accuracy trade-off of using datasets of smaller
sizes to train our model.Comment: Robotics and Autonomous Systems Journal, Elsevier, to appear 2023.
arXiv admin note: substantial text overlap with arXiv:2205.0253
Fast Object Learning and Dual-arm Coordination for Cluttered Stowing, Picking, and Packing
Robotic picking from cluttered bins is a demanding task, for which Amazon
Robotics holds challenges. The 2017 Amazon Robotics Challenge (ARC) required
stowing items into a storage system, picking specific items, and packing them
into boxes. In this paper, we describe the entry of team NimbRo Picking. Our
deep object perception pipeline can be quickly and efficiently adapted to new
items using a custom turntable capture system and transfer learning. It
produces high-quality item segments, on which grasp poses are found. A planning
component coordinates manipulation actions between two robot arms, minimizing
execution time. The system has been demonstrated successfully at ARC, where our
team reached second places in both the picking task and the final stow-and-pick
task. We also evaluate individual components.Comment: In: Proceedings of the International Conference on Robotics and
Automation (ICRA) 201
ConvPoseCNN2: Prediction and Refinement of Dense 6D Object Poses
Object pose estimation is a key perceptual capability in robotics. We propose
a fully-convolutional extension of the PoseCNN method, which densely predicts
object translations and orientations. This has several advantages such as
improving the spatial resolution of the orientation predictions -- useful in
highly-cluttered arrangements, significant reduction in parameters by avoiding
full connectivity, and fast inference. We propose and discuss several
aggregation methods for dense orientation predictions that can be applied as a
post-processing step, such as averaging and clustering techniques. We
demonstrate that our method achieves the same accuracy as PoseCNN on the
challenging YCB-Video dataset and provide a detailed ablation study of several
variants of our method. Finally, we demonstrate that the model can be further
improved by inserting an iterative refinement module into the middle of the
network, which enforces consistency of the prediction
Supervised Autonomy for Exploration and Mobile Manipulation in Rough Terrain with a Centaur-like Robot
Planetary exploration scenarios illustrate the need for autonomous robots that are capable to operate in unknown environments without direct human interaction. At the DARPA Robotics Challenge, we demonstrated that our Centaur-like mobile manipulation robot Momaro can solve complex tasks when teleoperated. Motivated by the DLR SpaceBot Cup 2015, where robots should explore a Mars-like environment, find and transport objects, take a soil sample, and perform assembly tasks, we developed autonomous capabilities for Momaro. Our robot perceives and maps previously unknown, uneven terrain using a 3D laser scanner. Based on the generated height map, we assess drivability, plan navigation paths, and execute them using the omnidirectional drive. Using its four legs, the robot adapts to the slope of the terrain. Momaro perceives objects with cameras, estimates their pose, and manipulates them with its two arms autonomously. For specifying missions, monitoring mission progress, on-the-fly reconfiguration, and teleoperation, we developed a ground station with suitable operator interfaces. To handle network communication interruptions and latencies between robot and ground station, we implemented a robust network layer for the ROS middleware. With the developed system, our team NimbRo Explorer solved all tasks of the DLR SpaceBot Camp 2015. We also discuss the lessons learned from this demonstration